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TURBO-CODE DECODER 



BACKGROUND OF THE INVENTION 

Field of Invention 

[0001] The present invention generally relates to a decoder, and more particularly, to a 
fast turbo-code decoder. The decoder is designed to use the systolic array very large 
scaled integrated (VLSI) circuits; the output of previous level can be used as the input of 
next level. Thus, the advantages of the parallel and the pipeline calculation are totally 
achieved. The decoding speed has improved manifestly comparing to the calculation 
time of the conventional decoder. The speed has about 5*(N+M) times faster than the 
conventional decoder, wherein, N stands for the block length, and M stands for register 
size. 

Description of Related Art 

[0002] The error control coding is widely used in the communication system and the 
computer media storage. Berrou, Glavieux and Thitimajshima first proposed the turbo- 
code whose error-correcting capability nears to the Shannon limited error-correcting in 
1993 (C. Berrou, A. Glavieux, and P. Thitimajshima, "Near Shannon Limited Error- 
correcting Coding and Decoding: Turbo-codes (1)," in Proc. ICC'93, May, 1993). Since 
the excellence of the error-correcting capability, the turbo-code is widely applied in the 
general communication system such as the CDMA transmission system. Whereas, if the 
block length of the conventional decoding algorithm is too small, the error-correcting 
capability is not good, wherein the block length is for transmission. On the other hand, if 
the block length of transmission is too large, for a communication system needs the real 
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time processing, the decoding delay is too large to tolerant. Therefore, it is important to 
solve this problem to fulfill the requirement of the current high-speed communication. 



SUMMARY OF THE INVENTION 
5 [0003] To solve the problem mentioned above and to increase the computing speed and 
thus to increase the throughput. The present invention provides a structure design using 
the parallel and systolic array VLSI. 

[0004] The structure design adopting the parallel and systolic array VLSI mentioned 

ru 

y above, wherein the decoder is designed to use the systolic array VLSI circuits. Since the 
y;i 10 output of previous level can be used as the input of next level. So the advantages of the 
p parallel and the pipeline calculation are totally achieved. The latency is only N+M+2 

w units of time, the latency is shorten to as about 1/5 comparing to the conventional 

sequential calculation structure that takes 5*(N+M) units of time. The decoding 
throughput is about 5*(N+M) times higher than the conventional decoder. Although the 
15 quantity of the circuit gate is about 5*(N+M) times higher than the conventional decoder. 
However, the VLSI techniques had been progressively improved nowadays, thus the 
hardware complexity is easy to overcome. Devoting the hardware cost to get the higher 
speed will be a changeless trend. 

[0005] In order to achieve the objective mentioned above, the present invention uses a 
20 parallel and systolic array VLSI structure design to provide a turbo-code decoder for the 
communication system. The decoder comprises a serial-to-parallel output unit and a 
plurality of parallel decoding units. Wherein, the serial-to-parallel output unit receives a 
serial input signal, converts it and outputs a parallel signal. The parallel decoding units 
mentioned above are serially connected to form a plurality of levels. The first level 
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parallel decoding unit receives the parallel signal that is output from the serial-to-parallel 
output unit. The output from the first level parallel decoding unit is sent to the second 
level parallel decoding unit, with certain sequence, the parallel signal passes through the 
parallel decoding units for decoding process. 
5 [0006] The turbo-code decoder mentioned above, wherein, each parallel decoding unit 
receives an extrinsic parameter when processing the decoding process, to be the signal 
that is after the decoding process from the parallel decoding unit, and sends the extrinsic 

O 

parameter to the next level of the parallel decoding unit. 

fij 

i*l [0007] The turbo-code decoder mentioned above, wherein, the extrinsic parameter is 
In 10 obtained from a deinterleaving operation. The extrinsic parameter of the first level 

s 

□ parallel decoding unit is L a0 k =(0,0. . .,0), where k=l, 2, . . ., N, N is the block length of the 

rll 

O turbo-code. 

[0008] The turbo-code decoder mentioned above, wherein, the serial input signals are r ls k , 
r lpk , and r 2pM messages of the turbo-code, whereas k=l, 2, N, N is the block length of 
15 the turbo-code. 

[0009] The turbo-code decoder mentioned above, wherein, the serial-to-parallel output 
unit receives the r lsk , r lpk , and r 2pk , wherein, the subscript K=0, 1, N+M-l represents 
the whole block and end message. M stands for register size of the turbo-code decoder. 
The serial-to-parallel output unit coverts the received r lsk , r lpk , and r 2pk messages and 
20 outputs the results to the first level parallel decoding unit in parallel. The first level 
parallel decoding unit also receives an extrinsic parameter L a k at the same time. The L a k 
is the parameter that is obtained via a deinterleaving operation on the previous level 
extrinsic parameter A(d k ). The initial value of the first level decoding unit extrinsic 
parameter is set as L a0 k =(0,0 ... ,0), a first level extrinsic parameter L al k is generated 
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via the first level parallel decoding unit. And makes the message r ls k , r lp k and r 2p k pass 
through sequentially to be the input of next level. 

[0010] The turbo-code decoder mentioned above, wherein, the parallel decoding unit 
comprises a first decoder, a second decoder, an interleaving unit, and a deinterleaving 
unit. Wherein, the first decoder receives the r, sk , r lpk messages and the extrinsic 
parameter L a k . The second decoder receives the r 2p k message and the extrinsic parameter 
L^ k . The interleaving unit is allocated between the first decoder and the second decoder, 
receives the output of the first decoder. The deinterleaving unit is connected to the 
second decoder, alternately outputs the output of the first decoder and the second decoder. 
[0011] The turbo-code decoder mentioned above, wherein, the first decoder of the 
parallel decoding units constitutes a systolic array VLSI circuits structure. 
[0012] The turbo-code decoder mentioned above, wherein, the systolic array VLSI 
circuits is composed of N+M units of the module C, A, B, D, and E. Wherein, the module 
C receives L al k , r lsk and r lpk , and outputs r/ ,} (/w) and r/ 0) (w). Module A calculates a 
forward recursive probability parameter a k . Module B calculates a backward recursive 
probability parameter /? k . Module D adopts (N+M) units of parallel calculation to 
obtain the A(d k ) after the calculation of the a k , 0 k , and y[° are finished. Module E 

outputs the value of the calculation from the module D, K=0, 1, N+M-l. 

[0013] The turbo-code decoder mentioned above, wherein, the value of the A(^) is 

calculated according to a MAP algorithm and following equation: 



A(rfJ = log 
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[0014] Wherein, a k is the forward recursive probability parameter, k is the backward 
recursive probability parameter, y ( k ° is a branch probability parameter. 
[0015] The turbo-code decoder mentioned above, wherein, the forward recursive 
probability parameter a k is obtained from the calculation of the previous parameter a k _ x 

and the branch probability parameter y[° , the equation is as follows: 

£ j^<%m\m).a t _ l (m') 
a k (w) = — "' "° , 

Z Z 2>i°o» , .'»o-«*-iO« , > 

m m* i=0 

[0016] The turbo-code decoder mentioned above, wherein, the backward recursive 
probability parameter /? k is obtained from the calculation of the next parameter J3 k+l and 

the branch probability parameter y[° , the equation is as follows: 

Z irftO"'.*) 
AW- m "° 



Z Z Z^.( w *''")-a +1 (^) 

m m' /=0 

[0017] The turbo-code decoder mentioned above, wherein, the branch probability 
parameter y ( k ° is obtained from following equation according to the MAP algorithm: 

ri°(m\m) = p(j Xsk | d k = i,s k = m,s k _ x = rrC) • p(r Xsk \ d k = = m 9 s k _ x = m') • 
= 1 1 = m,s k _ x = m') • Pr{s k = m \ s k _ x = m'} 

[0018] Wherein whether the probability parameter q{d k - i \ s k = w,^., = m % ) is 0 or 1 
depends on the input bit d k -i is 0 or 1 combines the probability of the state m' to the 
state m. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



[0019] The accompanying drawings are included to provide a further understanding of 
the invention, and are incorporated in and constitute a part of this specification. The 
drawings illustrate embodiments of the invention, and together with the description, serve 
5 to explain the principles of the invention. In the drawings, 

[0020] FIG. 1 schematically shows a turbo-code encoder comprising of two parallel RSC 
h fe encoders; 



[0023] FIG. 4 schematically shows the structure of the first level decoding unit of the 
parallel decoding units in FIG. 3; 

[0024] FIG. 5 schematically shows the structure of the systolic array VLSI that is 
composed of the first level decoding unit of the parallel decoding unit in FIG. 4; 



15 [0025] FIG. 6 schematically shows the structure of the simplified modules, data streams, 
and the latches of the parallel decoding units in FIG. 3 when N=4 and M=3; 
[0026] FIG. 7 schematically shows the calculation structure of the branch probability 

parameter y^im'.m) ; 

[0027] FIG. 8 schematically shows the structure of module A for calculating a k ; 
20 [0028] FIG. 9 schematically shows the structure of module B for calculating k ; 

[0029] FIG. 10 schematically shows the structure of module D for calculating A(d k )\ 
[0030] FIG. 1 1 schematically shows the structure of the calculation submodule L (using 
analog circuit); 




[0021] FIG. 2 schematically shows the decoding structure of the turbo-code; 

[0022] FIG. 3 schematically shows the structure of the P levels parallel decoding unit 



(Level 1, Level 2, Level P); 
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[0031] FIG. 12 schematically shows the structure of the fast RSC encoder, wherein, 
G b =1011,G d =1110; 

[0032] FIG. 13 schematically shows the trellis diagram; 

[0033] FIG. 14 schematically shows the detail structure of module A (wherein the 
submodule L is designed as the digital circuit); 

[0034] FIG. 15 schematically shows the detail structure of module D; 

[0035] FIG. 16 schematically shows the latency for accomplishing a message having a 

block size length; and 

[0036] FIG. 17 schematically shows the comparison of the bit error rate, wherein, the 
iterative decoding number P=6, code ratio R^l/3, register size M=3, generator parameter 
G b =1011, G d =1110, the 256*256 random interleaving method is adopted by the first 
decoder and the second decoder. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
[0037] The present invention provides a structure design adopting the parallel and 
systolic array VLSI. The structure design adopting the parallel and systolic array VLSI 
mentioned above, wherein the decoder is designed to use the systolic array VLSI circuits. 
Since the output of previous level can be used as the input of next level. So the 
advantages of the parallel and the pipeline calculation are totally achieved. The latency is 
only N+M+2 units of time, the latency is shorten to as about 1/5 comparing to the 
conventional sequential calculation structure that takes 5*(N+M) units of time. The 
decoding throughput is about 5*(N+M) times higher than the conventional decoder. 
Although the quantity of the circuit gate is about 5*(N+M) times higher than the 
conventional decoder. However, the VLSI techniques had been progressively improved 
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nowadays, thus the hardware complexity is easy to overcome. Devoting the hardware 
cost to get the higher speed will be a changeless trend. 

[0038] Berrou, Glavieux and Thitimajshima first proposed the turbo-code whose error- 
correcting capability nears to the Shannon limited error-correcting in 1993 (C. Berrou, A. 
5 Glavieux, and P. Thitimajshima, "Near Shannon Limited Error-correcting Coding and 
Decoding: Turbo-codes (1)," in Proc. ICC'93, May, 1993). The encoding structure 
comprises two parallel recursive systematic convolution encoder (hereafter abbreviated 

M 

Q as RSC). The important characteristics are (1) Two convolution codes with the same 
HI structure encode in parallel, thus the receiving end is able to decode the message 

ill ' 

10 repeatedly; (2) To increase the minimum distance between two encoding codes by using 
the non-uniform random interleaving (S. Benedetto and G. Montorsi: "Role of Recursive 

fill 

p Convolutional Codes in Turbo Codes," Electron. Lett., Vol.31, No. 11, pp. 858-859, 

H 

O 1995); and (3) Soft-in Soft-out decoding. 

FtJ 

[0039] Because the characteristics mentioned above, the capability of the error- 
15 correcting appears equal and excellent. Due to the excellence of the error-correcting 
capability, the turbo-code is widely applied in the general communication system such as 
the CDMA transmission system (J. Blaanz, P. Jung, and M. Na B han, "Realistic 
Simulations of CDMA Mobile Radio Systems Using Joint Detection and Coherent 
Receiver Antenna Diversity," IEEE third International Symposium on Spread Spectrum 
20 Techniques and Applications, Oulu Finland, 1994). 

[0040] Referring to FIG. 1, it schematically shows a turbo-code encoder comprising of 

two parallel RSC encoders. The input bit sequence is represented as d=(d If d 2 , d 3 , , 

d k , ... , d N ) , where d k is the input bit of the encoder at time k, k is from 1 to N, N is the 
block size. The output of the encoder at time k is represented as c k =(x ky y Ik ,y 2k ) . Since 
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the encoder is systematic, so x k = d k , the surplus code output is represented as y Ik9 y 2k . 
The decoding structure of the turbo-code is shown in FIG. 2. The decoder 200 comprises 
two recursive decoding units 210 and 220; two recursive decoding units 210 and 220 are 
connected in interleaving and deinterleaving unit as shown as the 212, 214 and 216 in the 
diagram. 

[0041] It is assuming that the Gaussian noise is the noise used in the communication 
channel. It is further assuming that the noise of each transmission symbol is an 
independent noise, the expectation value is 0, and the variant is N 0 /2 . Using the binary 
modulation, if the input bit d k is 0, the modulation is -1.0; if the input bit d k is 1, the 
modulation is +1.0. Therefore, the sequence of the receiving vector R is represented as 

R = ( r i> r 2> r 3> > r k > r s)y the kth symbol is represented as 

r k =(r JSik , r Jp k9 r 2p k )= (2x*- 1 +n !5tk , 2y n - l+n /pJt ,2y 2Jt - l+n 2p k ) 
[0042] Wherein, n Is k , n Jp k9 and n 2p k is the noise of the channel r /5 , r lp , r 2p at time k 
respectively, and they are independent each other. The detail of the Maximum A 
Posteriori (hereafter abbreviated as MAP) algorithm proposed by BCJR (L. Bahl, J. 
Cocke, F. Jelinek, and J. Raviv, "Optimal Decoding of Linear Codes for Minimizing 
Symbol Error Rate," IEEE Tran. I. T., Vol.20, pp.284-287, March 1974) is not 
superfluously described here. Herein, only describe the result of the MAP algorithm. 
The objective of the MAP algorithm is to calculate whether the A Posterioi Probability 
(hereafter abbreviated as APP) of each input bit d k is the ratio of 1 or 0. Wherein, k=0, 1, 
2, N-l. From the derivation result of the turbo-code having the error-correcting 
capability nears to the Shannon limited error-correcting proposed by Berrou, Glavieux 
and Thitimajshima mentioned above, the following equation is obtained: 
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Z Irf^^-aw^-AW 

AK) = l0g » ^ ; (1) 

w m* 

Wherein, cr* is the forward recursive probability parameter, fi k is the backward 
recursive probability parameter, yV IS ^ e branch probability parameter. As we can see 
from the name, the forward recursive probability parameter a k can be obtained from the 
=l 5 calculation of the previous parameter a k _ x and the branch probability parameter y { k n , the 



w equation is as follows: 



a k (m)= "\ (2) 

Z Z Zrf'V.m) •«*-,(«•) 



Q The backward recursive probability parameter p k can be obtained from the calculation of 



the next parameter (i M and the branch probability parameter y { k l x , the equation is as 
10 follows: 

P k (m)= m ' (3) 

Z Z Z^O"'.'")-^^) 

m to' 1=0 

The branch probability parameter is obtained from following equation according to 
the MAP algorithm: 

Y?(m\m) = p(Y u%k \d k = i,s k = m,s k _ x = rri)- p(r Xsk \d k =i,s k = m,s k _ x =m')- 
<l(d k =i\s k = m y s k _ x = m') . Pr{s k = m \ s k _ x = m x ) 

15 Wherein, whether the probability parameter q{d k -i\s k = m,s k _ x = iri) is 0 or 1 depends 

on the input bit d k -i is 0 or 1 combines the probability of the state m' to the state m. 
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[0043] In a sequential calculation decoder, it is assuming that each A(^) in equation (1) 

needs a unit of time, wherein, K is from 0 to N+M-l, N stands for the block length of the 
transmission, and M stands for the register size of the decoder. It is further assuming that 
a k , J3 k , and in equation (2), (3), and (4) needs a unit of time respectively, wherein, 

5 i=0 or 1. Therefore, the first level decoder needs 5*(N+M) units of time. According to 
the decoding algorithm such as the Viterbi algorithm (A.J. Viterbi, "Error Bound for 

□ Convolutional Codes and an Asymptotically Optimum Decoding Algorithm," IEEE 

O 

H jj Trans. Inform. Theorem, vol.IT-13, pp.260-269 Apr. 1967)(A.J. Viterbi and J.K. Omura, 

jUj 

f'\ "Principles of digital communication and coding/* New York: MacGraw-Hill, 1979) or 

h B 

^ 10 the BCJR algorithm mentioned above, if N is too small, the error-correcting capability is 

fll not good. However, if N is too big, for a communication system needs the real time 

S| processing, the decoding delay is too big to tolerant. 

Z38* 

rll [0044] As mentioned in the previous paragraph, currently the decoding algorithm is used 

to decide the value of A(d k ) in equation (1), if A(<^)>0, d k =1, otherwise, d k =0. To 

15 calculate each A(d k ) in equation (1), the a k , k , and y* } in equation (2), (3), and (4) 

must be calculated first. For a sequential calculation decoder, it needs 5*(N+M) units of 
time (G.Masera, G. Piccinini, M.R. Roch, nad M. Zqmboni, "VLSI Architectures for 
Turbo Codes," IEEE Trans. On VLSI Systems, vol.7, no.3, pp. 369-379, Sep.1999). 
[0045] In order to increase the calculation speed and thus to increase the throughput. A 
20 preferred embodiment of the present invention adopts the parallel and systolic array VLSI 
structure design. The whole decoder circuit is composed of P levels parallel decoding 
units. The structure is shown in FIG. 3. There is a serial in parallel out unit before the 
first level to receive the message r ls k , r lp k9 and r 2p k9 wherein, the subscript 
K=0,1,...,N+M-1 represents the whole block and end message. The output is sent to 
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the first level decoding unit, the other input of the first level decoding unit is L a k , herein, 
the L a k is the parameter obtained via the deinterleaving on the previous level extrinsic 
parameter A(d k ), the initial value of the 0 th level decoding unit extrinsic parameter is set 

asZ fl 0 *=(O,O...,O). The first level extrinsic parameter L a , k is generated via the first 
level decoding unit, and the message r Isk , r Jp k9 and r 2p k sequentially pass through to be 
the input of next level. 

[0046] Each level of the decoding unit comprises two decoders. These two decoders are 
the first decoder and the second decoder as shown in FIG. 4, wherein, the structure of the 
first decoder is similar to the second decoder's. The whole systolic array VLSI structure 
is shown in FIG. 5. Wherein, N and M can be adjusted according to the design 
requirement. For easy to describe, the block length N=4 and register size M=3 are used as 
an example. FIG. 6 schematically shows the structure of the simplified modules, data 
streams, and the latches. It is apparent for those who skilled in the art that even the 
embodiment is used as an example in the present invention, the embodiment will not limit 
the apply range of the present invention. 

[0047] According to the literature (I.L. Turner, "A Modified BAHL Algorithm for 
Recursive System Convolutional Codes on Rayleigh Fading Channels," IEEE 49th 
Vehicular Technology Conference, pp.75-76 vol.1, 1999), the apriori probability of the 
input bit d k calculated by the previous level decoder is represented as 



Pr{s k = m | s k _, =m} = 




> ifq(d k =l|s k =m,s k . 1 = m , )=l 



(5) 



Pr{s k =m|s k . I =m} = 



e L(d K ) l 



if q(d k =0|s k =m,s k _, = m')=l 



(6) 



l + e ud *> l + ^>' 
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Wherein, L(d k ) is the log likelihoodWio (LLR) extrinsic parameter calculated from the 
message bit d k by the previous level decoder. It is assumed in a AWGN channel, well 
than, the partial probability of the equation (4) is calculated as follows: 



P( r is,k \ d k =i,s k =m,s k _ x =m') = 



4l7tcj rXs 



exp 



P( r i P ,k I d k = i,s k = m,^., = m') = -= 



-exp 



(7) 



(8) 



Wherein, /u rU and // rlp (/w',m) is the expectation value of r, s and r lp respectively. 
Thereinto, ji ris depends on the input bit, and ju rlp (m',m) depends on the input bit and 
also impacted by the previous state and current state. cr 2 & and <j 2 lp is the variant of the r, s 

andr lp respectively. It is assumed that the variant of r, s and r, p are the same. Therefore, 
the above two equations can be multiplied and consolidated as follows: 



P( r ujt I d k = *>Sk = >*>-y*-i = m< ) ■ P( r \ P ,k I d k = i,s k = m,s k _ x = m') 
- 1 

~ Ina 2 * 1 2 



(9) 



For a discrete memory-less gauss channel, the branch probability parameter y\ or y° for 
input bit is 1 or 0 can be calculated from the equation (4), (5), (6), and (9) as follows: 



1 



Incj 



1 

Incr 1 



-exp 



-exp 



-1 (r ls ,k ~ !) 2 + (n P ,k ~ Mri P (m',m)) 2 



-1 0±k + 0 2 + ir^k ~ Mr, P (m'>m)) 2 



1 + 



1 



l + e 



(10) 



(11) 



[0048] According to the equation (10) and (11), the branch probability parameter 
y { k \m\m) can be calculated in parallel. The N+M units of the module C (as shown in 



FIG. 7) are used to calculate each y^(m\m) in parallel. Thus, the N+M units of time 
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can be shortened to a unit of time. The input signal of the module C in FIG. 7 is L a k , 
r ls k and r , p k respectively, wherein, k=l,...,N+M. The module C is used to calculate 
y ( k l) (m\m) and y ( k 0) (m\m) respectively. 

[0049] In addition, since the forward recursive probability parameter a k is output from 

the previous level to be the input of the next level, and the backward recursive probability 
parameter 0 k is output from the next level to be the input of the previous level. It is 



P suitable to design as the systolic array VLSI to increase the calculation speed. According 

'S! 

ru 
w 

H k . Wherein, the first level input is y\ l) (m\m) and y\ 0) (m\m) and the initial value of the 



to the equation (2), N+M units of Module A (as shown in FIG. 8) are used to calculate a 



10 forward recursive probability parameter a 0 (m) are used to calculates ^m). The 

: 3 second level input y^ l \m\m) and y^im'.m) and a ^m) are used to calculate a 2 ( m )- 

3 

jj Thus, the systolic array is able to work simultaneously. All a k (m), wherein 

k = 1 . . . , N - M , can be calculated after N+M units of time. 

[0050] According to the equation (3), it adopts N+M units of Module B (as shown in FIG. 
15 9) for calculating /3 k . Wherein, the first level input is y^\ M {rn\m) and y^l M {m\m) 

and the initial value of the backward recursive probability parameter 0 N+M (m) are 
used to calculate yS N+M . 1 (m). The inputs of the second level y^l M ^(rn\m) and 
YnI M -\(™\™) 5 and the backward recursive probability parameter/? N + M .!(m) are used 
to calculate /3 N+M -2( m )- T ^ e advantage is the structure of each module is the same; the 
20 output of the previous level is the input of the next level. Thus, the throughput is (N+M) 
times of the original throughput. 



14 



FILE: 8481USF.RTF 



[0051] When the calculation of a k , J3 k and y { k ° are completed, according to the 

equation (1), it adopts N+M units of module D (as the module D shown in FIG. 10) to 
calculate A(^) . By using the parallel calculation, the N+M units of time is shortened to 
a unit of time. 

[0052] The submodule L located in between the module A and the module B calculates 
the product-sum of two inputs. As the example shown in FIG. 1 1 , the submodule L 
adopts the analog circuit provided by the conventional technique. The analog circuits 
proposed by the reference literatures also can be used. Like H.-A.Loeliger, F. 
Lustenberger, F. Tarkoy, M. Helfensten, "Decoding in Analog VLSI," IEEE 
Communication Magzine, VoL37 (4), pp.99-101 Apr. 1999, or H.-A.Loeliger, F. 
Lustenberger, M. Helfensten, F. Tarkoy, "Probability Propagation and Decoding in 
Analog VLSI," IEEE Trans.on Information Theory, Vol.47(2), pp.837-843 Feb. 2001, or 
F. Lustenberger, M. Helfenstein, H,-A, Loeliger, F. Tarkoy, G.S. Moschytz, "An Analog 
VLSI Decoding Technique for Digital Codes," ISCAS '99. Proceedings of the 1999 
IEEE international Symposium on Circuits and Systems, Vol. 2, pp.424-427 1999,. . ., etc. 
[0053] For easy to describe the detail structure of the module A, B, and D mentioned 
above, the preferred embodiment of the present invention uses the turbo-code of the third 
generation CDMA mobile communication standard as an example for description. 
However, it is not used to limit the apply range of the present invention. The turbo-code 
of the third generation CDMA mobile communication standard is: a decoder register size 
M=3. For the first decoder and the second decoder, the code ratio R=l/3, the parameter of 
the feedback generator and the parameter of the direct-feed-forward generator is 
G 6 =1011 and G d =l 1 lOrespectively. As shown in FIG. 12, the recursive systematic 
convolution encoder (hereafter abbreviated as RSC), wherein, the RSC adopts the fast 
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RSC encoder, for the physical content of the fast RSC encoder, please refer to the "Fast 
Turbo-code Encoder" proposed by the same inventor of the present invention in April, 
2001. The trellis diagram is shown in FIG. 13. 

[0054] Referring to the content of FIG. 6, FIG. 6 schematically shows the structure of the 
simplified modules, data streams, and the latches when the block length N=4 and the 
register size M=3. There are N+M=7 units of the module A, B, C, and D. In the first unit 
of time, the parallel input L a k9 r ls k and r lp k signals, k= 1,2,..., 6,7 are used 
simultaneously to calculate the y\** , y*i , /i 1 • I* 1 7 units of time afterwards, the 
and j3 x , 0 2 > -•-> 0 6* s calculated respectively. In the other one unit of 
time afterwards, according to the equation (2), the parallel input y { k l) (m\m) , yl°\m\m) , 
a k _! and 0 k _j are used to calculate A(d k ). The A(d k ) is used as the extrinsic 

parameter of the next level, if the last level is reached, the d k is determined accordingly, if 
d k >0, determine d k =1, otherwise d k =0. 

[0055] According to the trellis diagram of FIG. 13. It is easy to simplify the structure of 
the module A, B, and D. FIG. 14 schematically shows the detail structure of the module 
A based on this design. The detail structure of the module B is also similar to the module 
A. The detail structure of the module D is shown in FIG. 15. 

[0056] The latency spent for accomplishing a message with one block size length of the 
parallel and systolic array VLSI structure design of the preferred embodiment according 
to the present invention, as shown in FIG. 16, is N+M+2 units of time. Comparing to the 
original conventional sequential calculation structure that needs 5*(N+M) units of time, 
the time is shortened to about 1/5 only. Furthermore, the systolic array VLSI structure 
design is able to generate a set of d k in every one unit of time after the first set of d k is 
generated. 
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The performance comparison is shown in table 1: 

Table 1 : The structure comparison of the systolic array and the sequential type 
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the circuit is about 








5*(N+M) times 



5 [0057] In order to prove the error-correcting feature of the preferred embodiment 
according to the present invention. Herein, the CDMA mobile communication system 
mentioned above is used as an example. The RSC decoder with register size M=3 is 
shown in FIG. 12. The trellis diagram is shown in FIG. 13. The iterative decoding 
number P=6. The random interleaving method is adopted in between the first decoder 

10 and the second decoder. The simulation result is obtained as shown in FIG. 17, wherein, 
the block length N=65536, the vertical axis is the decoding performance denoted by the 
bit error rate (BER). The horizontal axis is the communication environment denoted by 
the signal/noise ratio. As we can see here, under the situation with the same signal/noise 
ratio, the larger the iterative decoding number, the better the decoding performance. This 
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is accorded with the theory, and is similar to the simulation result disclosed in the 
contents of the literatures: C. Berrou, A. Glavieux, and P. Thitimajshima, "Near Shannon 
Limited Error-correcting Coding and Decoding: Turbo-codes (1)," in Proc. ICC 93, May, 
1993, and P.Robertson "Illuminating the Structure of Code and Decoder of Parallel 
Concatenated Recursive Systmatic (Turbo) Codes," in Proc. IEEE GLOBECOM Conf, 
San Francisco, CA. Pp. 1298-1303, Dec. 1994. 

[0058] The present simulation uses the programming language C language running on 
the Genuinelnter Pentium® III CPU, 128 MB RAM personal computer. The simulation 
runs on the working platform with the Windows Me® operating system. The bit error 
rate comparison shown in FIG. 17, wherein, the iterative decoding number (p=l,...,6), 
the code ratio R=l/3, the register size M=3, the generator parameter G b =101 1, G d =l 1 10, 
and uses the 256*256 random interleaving deinterleaving method. 

[0059] The present invention provides a fast turbo-code decoder. Wherein, the decoder 
is designed to use the systolic array VLSI circuits. Since the output of previous level can 
be used as the input of next level. So the advantages of the parallel and the pipeline 
calculation are totally achieved. The latency is only N+M+2 units of time, the latency is 
shorten to as about 1/5 comparing to the conventional sequential calculation structure that 
takes 5*(N+M) units of time. The decoding throughput is about 5*(N+M) times higher 
than the conventional decoder. Although the quantity of the circuit gate is about 5*(N+M) 
times higher than the conventional decoder. However, the VLSI techniques had been 
progressively improved nowadays, thus the hardware complexity is easy to overcome. 
Devoting the hardware cost to get the higher speed will be a changeless trend. 
[0060] Although the invention has been described with reference to a particular 
embodiment thereof, it will be apparent to one of the ordinary skill in the art that 
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modifications to the described embodiment may be made without departing from the 
spirit of the invention. Accordingly, the scope of the invention will be defined by the 
attached claims not by the above detailed description. 



19 



